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Google Query Serving Infrastructure 


Mi sc. servers query Many corpora 



Elapsed time: 0.25s, machines involved: 1000s+ 
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PageRank 


PageRankTM is the core technology to 
measure the importance of a page 

Google's theory 

- If page A links to page B 

# Page B is important 

# The link text is irrelevant 

- If many important links point to page A 

# Links from page A are also important 




























Key Design Principles 


Software reliability 

Use replication for better request throughput and availability 

Price/performance beats peak performance 

Using commodity PCs reduces the cost of computation 



The Power Problem 


High density of machines (racks) 

- High power consumption 400-700 W/ft2 

# Typical data center provides 70-150 W/ft2 

# Energy costs 

- Heating 

# Cooling system costs 
Reducing power 

- Reduce performance (c/p may not reduce!) 

- Faster hardware depreciation (cost up!) 



Parallelism 


Lookup of matching docs in a large index 

--> many lookups in a set of smaller indexes 
followed by a merge step 

A query stream 

--> multiple streams 

(each handled by a cluster) 

Adding machines to a pool increases serving 
capacity 



Hardware Level Consideration 


Instruction level parallelism does not help 

Multiple simple, in-order, short-pipeline core 

Thread level parallelism 

Memory system with moderate sized L2 cache 
is enough 

Large shared-memory machines are not 
required to boost the performance 



GFS (Google File System) Design 
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• Master manages metadata 

• Data transfers happen directly between clients/chunk servers 

• Files broken into chunks (typically 64 MB) 


10 





































GFS Usage @ Google 


• 200+ clusters 

• Many clusters of 1000s of machines 

• Pools of 1000s of clients 

• 4+ PB Filesystems 

• 40 GB/s read/write load 

- (in the presence of frequent HW failures) 
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The Machinery 



Servers 
•CPUs 
•DRAM 
• Disks 


Clusters 


Racks 

• 40-80 servers 

• Ethernet switch 
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Architectural view of the storage hierarchy 
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One server 

DRAM: 16GB, 100ns, 20GB/S 
Disk 2TB, 10ms. 200MB/S 
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Cluster Switch 



Local rack (80 servers) 

DRAM: ITB, 300us, 100MB/S 
Disk 160TB, 11ms, 100MB/S 


Cluster (30+ racks) 

DRAM: 30TB, 500us, 10MB/S 
Disk 4 80PB, 12ms, 10MB/S 
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Clusters through the years 


"Google' 7 Circa 1997 (google.stanford.edu) 


Google (circa 1999) 
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Clusters through the years 


Google Data Center (Circa 2000) 


Google (new data center 2001) 














Current Design 


• In-house rack design 

• PC-class motherboards 

• Low-end storage and networking hardware 

• Linux 

• + in-house software 
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Container Datacenter 
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Container Datacenter 
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Multicore Computing 
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Comparison Between Custom built & High- 

end Servers 



Typical x86 - based server 

Custom built x86 - based server 


PROCESSORS 

8 2-GHzXeon CPUs 

176 2-GHzXeon CPUs 

22x 

RAM 

64 Gbytes of RAM 

176 Gbytes of RAM 

3x 

DISK SPACE 

8 Tbytes of disk space 

7 Tbytes of disk space 

-1TB 

PRICE 

$758,000 

$278,000 

$480,000 
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Implications of the Computing Environment 


• Stuff Breaks 

• If you have one server, it may stay up three years (1,000 days) 

• If you have 10,000 servers, expect to lose ten a day 

• “Ultra-reliable” hardware doesn’t really help 

• At large scale, super-fancy reliable hardware still fails, albeit less often 

- software still needs to be fault-tolerant 

— commodity machines without fancy hardware give better 
performance^ 

• Reliability has to come from the software 

• Making it easier to write distributed programs 
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Infrastructure for Search Systems 


• Several key pieces of 
infrastructure: 

- GFS 

- MapReduce 

- BigTable 
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MapReduce 


• A simple programming model that applies to many 

large-scale computing problems 

• Hide messy details in MapReduce runtime library: 

- automatic parallelization 

- load balancing 

- network and disk transfer 
optimizations 

- handling of machine failures 

- robustness 

- improvements to core library benefit all users of 
library! 
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Typical problem solved by MapReduce 


• Read a lot of data 

• Map: extract something you care about from each record 

• Shuffle and Sort 

• Reduce: aggregate, summarize, filter, or transform 

• Write the results 

• Outline stays the same, map and reduce change to fit 
the problem 
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Conclusions 


• For a large scale web service system like 
Google 

- Design the algorithm which can be easily 
parallelized 

- Design the architecture using replication to 
achieve distributed computing/storage and 
fault tolerance 

- Be aware of the power problem which 
significantly restricts the use of parallelism 
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